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EVENT-CONTROLLED ERROR INJECTION SYSTEM 



BACKGROUND OF THE INVENTION 



This invention is In the field of computer sys- 
tems, and particularly pertains to the testing of 
computer restart retry and recovery mechanisms 
by the purposeful injection of errors into a com- 
puter system in order to provoke and evaluate a 
restart, retry or recovery mechanism. 

In the past art of error injection, errors were 
injected by probing pins or circuit paths in order to 
force connected circuitry to certain states indicating 
the occurrence of errors {USA Patent 4 759 019). 
With miniaturization and integration of circuit func- 
tions resulting in a manifold increase of functional- 
ity on a decreasing physical base, specific circuit 
points of interest are usually not available at acces- 
sible locations. Packaging and miniaturization make 
the probing of specific points internal to circuitry 
impractical. Further, the nature of circuit technology 
currently in vogue is not compatible with "OR 
dotting" of an error signal into a circuit. 

Further, the known modes of error injection are 
unsuitable for realistically evaluating computer error 
response. The prior art of error injection is based 
primarily upon error initiation which occurs without 
regard to, or as the result of, circuit operation 
(USA-Patents 4 228 537, 4 308 616, 4 669 081 and 
IBM Technical Disclosure Bulletin of March 1988, 
pages 217 to 219). In this regard, error injection 
may be synchronized with circuit operations in the 
sense that the error injection mechanism responds 
to a clock which also drives the circuit that will 
receive the error. However, the mechanism Initiates 
the error with total disregard for circuit events. 
Therefore, the error is triggered In an arbitrary 
manner, without considering the state of the circuit. 

Therefore, there is a manifest need for an error 
injection mechanism in a computer system which 
can simulate computer malfunction by injecting er- 
rors by a means which is compatible with circuit 
fabrication technology, and in a mode which is 
influenced by machine operation. 



SUMMARY OF THE INVENTION 



The inventors have realized that the usefulness 
of error injection is enhanced by operating the 
injection mechanism in response to the processes 
which are innate to the machine being tested. Fur- 
ther, the inventors have observed that the Incor- 
poration of error injection Into machine functionality 
is achieved by provision of an error injection 



mechanism which can be physically integrated with 
the machine being tested. Also important is the 
inventors' realization that a close approximation of 
the randomness with which true errors occur and 

5 manifest themselves necessitates the provision of a 
variety of modes of error injection. 

In giving form and function to their invention, 
the inventors provide a mechanism for injecting 
errors for test and evaluation of a processing ma- 

iq chine inwhich a plurality of machine events occur 
over time. The injection mechanism includes a 
machine event collector distributed within the ma- 
chine and a programmable event mask circuit dis- 
tributed within the machine and connected to the 

75 machine event collector for masking events col- 
lected by the event collector to detect a mask- 
defined machine state. A counter is provided which 
is connected to the mask circuit for counting the 
occurrences of the mask-defined machine state, 

20 and a programmable mode error injector is con- 
nected to the counter for injecting an error into the 
machine upon the counter reaching a certain count 
The error is injected according to an intermittent 
mode, or according to a continuous mode. Provi- 

25 sion is made in the programmable mode error 
injector for selectively delaying the injection of the 
error from the occurrence of the state count which 
stimulates the error. 

A primary objective of this invention, therefore, 

30 is to provide the error injection mechanism which is 
merged logically, functionally, and physically with 
the processing machine that it serves. 

Specifically, this objective is achieved by the 
error injection system of the invention, which in- 

35 jects errors into the processing machine for testing 
and evaluating the machine, the injection respond- 
ing to preselected states of the machine. 

Achievement of this objective and other atten- 
dant advantages and benefits by the practice of 

40 this invention will be appreciated when the follow- 
ing detailed description is read with reference to 
the below-described drawings. 



45 BRIEF DESCRIPTION OF THE DRAWINGS 

Figure 1 is a detailed block diagram illustrat- 
ing the error injection mechanism of the invention 
in a form which is integrated within the processing 
so machine which it serves. 

Figure 2 is a detailed schematic diagram 
illustrating a circuit for selecting a mode of injection 
error. 

•Figure 3 is a schematic diagram of a parity 
error generator which operates in response to the 
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error injection signal generated by the circuit of 
Figure 2. 

DETAILED DESCRIPTION OF THE PREFERRED 
EMBODIMENT 

In this application, the term "processing ma- 
chine" is synonymous with the term "computer", 
"processor", and "computer facility", or any other 
equivalent term. The term "event" signifies the 
occurrence or existence of a condition affecting 
data in a processing machine, or affecting pro- 
cesses executing in the machine. An event is re- 
presented by the state of a signal generated by 
hardware, software, or firmware of the processing 
machine upon the happening of the condition. Last, 
the term "machine state" refers to the concurrent 
existence of sets of particular conditions within a 
processing machine. "Retry", "restart" and 
"retrieval" mechanisms and processes are given 
the usual meanings; see for example the meanings 
given in Sippl's Computer Dictionary, Fourth Edi- 
tion, 1986. 

As is known, retry and recovery mechanisms 
are provided in processing machines for the pur- 
pose of reacting to a machine error (or "machine 
check") by stopping some, or ail, of the processing 
activity of the machine, placing it into a "retry" 
state, and starting machine operation from this 
state. The testing of such mechanisms is prob- 
lematical. By their very nature they operate in 
response to events which are pathological, spo- 
radic and unpredictable. The invention provides a 
means for injecting errors which appear to be 
"real" errors by operating in response to alterable 
patterns of machine events. The alterable patterns 
are in the form of masks that are distributed within 
the processing machine and located at the sites 
which the events they are masking are generated. 
The overall form of the error injection system is 
illustrated in Figure 1. 

In Figure 1, a processing machine 8 is illus- 
trated. The processing machine can comprise, for 
example, a I/O processor attached to a mainframe 
computer for the exchange of data between the 
main storage of the computer and peripheral de- 
vices. The processing machine 8 exhibits a modu- 
lar design having several levels of definition. The 
highest level of structural definition includes cards 
10 and 12. It is asserted that a card is a modulariz- 
ed, replaceable unit and carries logic in the form of 
any combination of hardware, software or firmware 
assembled to perform some defined high-level 
function or functions of the processing machine. 
Characteristically, the processing machine is 
formed by the integration of a plurality of cards by 
means of a physical backplane structure into which 
the cards can be plugged, and from which the 



cards can be removed for replacement. 

Typically, a card, such as the card 10, derives 
its functionality from a plurality of monolithic in- 
tegrated devices, such as chips 16 and 18, moun- 

5 ted on the card. 

The chip 18 is an integrated semiconductor 
device that includes a plurality of logic units in the 
form of hardwired, programmed, or programmable 
circuitry. For example, the chip 18 may include 

10 programmed circuitry forming a state machine 
(SM) 20, and a logic circuit 33 which can comprise 
a collection of gates or other elementary devices 
interconnected to perform a specific function. Al- 
though it is not shown, the chip 18 also includes 

is other logic "chip logic" to which the state machine 
20 and circuit 23 are connected. 

During operation, assume that the state ma- 
chine exhibits a condition signified by a signal on 
the line 21, which is termed an event Likewise, the 

20 signal line 24 conducts a signal generated by the 
circuit 23 that is representative of another event 

Machine processes and functions are synchro- 
nized by a common clock signal CLK. This signal 
is generated conventionally and distributed 

25 throughout the machine down to the chip level, it 
has a conventional wave shape consisting of a 
succession of pulses at equal intervals. 



30 THE INVENTION 

The invention includes a structure that is dis- 
tributed within the processing machine 8 and which 
includes portions integrated onto chips carried on 

35 the cards forming the processing machine. In this 
regard, the elements of the invention which are 
provided at the chip logic level are represented by 
a programmable mask register 25, gates 32, 33, 
35, 36, 37 and a latch 38. For illustration of the 

40 invention, error generation is provided at the chip 
level by, for example, a programmable error mask 
latch 58 and a gate 58. Although not illustrated, it is 
asserted that the chip 16 also includes elements of 
the invention which correspond to those listed 

45 above for the chip 18 as do other chips on the card 
10 which are not illustrated. Further, it is asserted 
that structural elements of the invention are present 
also in chips on the card 12. 

The invention includes a set of card-level ele- 

50 ments that embraces card gates corresponding to 
the gate 42 of card 10. In addition, card level 
elements are placed on a card 14 which is termed 
a "maintenance" card. The card level elements on 
the maintenance card 14 include a programmable 

55 state mask register 45, clock latches 46 and 47, 
gates 48, 49, and 51, and a mulrjmode error injec- 
tion circuit 53. 

The error injection circuit 53 provides a signal 
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which is sent back to chip level circuitry of the 
processing machine. One set of such chip level 
circuitry includes the gate 56 and the error mask 
latch 58. 

In the invention, the circuit-level circuitry con- 
nected to the mask register 25, for example, con- 
tinuously monitors the signal lines 21 and 24 to 
detect the occurrence of a pattern of events cor- 
responding to a mask in the register 25. When 
such a pattern occurs, a CHIP EVENT signal is 
provided on signal line 40 to a card-level collection 
gate 42, where it is combined using the well-known 
AND function with other chip events on the card 
10. Concurrency of all chip events collected by the 
gate 42 will raise a CARD EVENT signal on the 
signal line 44 that is provided to the card-level 
circuitry on the maintenance card 14. The card- 
level circuitry on the maintenance card 14 collects 
all of the card events and compares them against a 
condition mask in the register 45, which corre- 
sponds to a state of the process machine 8 that is 
of interest The machine state is indicated by a 
positive output from AND 51. The multimode select 
error injection logic 53 responds to occurrences of 
the masked machine state by providing an INJECT 
signal on signal line 54 that is fed back to the chip- 
level error generation circuitry illustrated by gate 
56 and latch 58. The invention also provides for 
masking of error generation in response to an 
INJECT signal so that a selectable pattern of errors 
can be generated in response to the INJECT sig- 
nal. 

It will be evident to those skilled in the art that 
the pattern programmability provided by the mask- 
ing of events, conditions, and error patterns pro- 
vides a wide-ranging, yet subtle, capacity to simu- 
late error conditions. These error conditions can be 
fine-tuned to fully test the intricate retry and restart 
mechanisms characteristic of modem processing 
machines. Programmability of the event, machine 
state, and error pattern masks is provided by 
software-level programming access to the mask 
registers. These connections are represented by 
event mask, state condition mask, and error mask 
signal lines 60. 61, and 62, respectively. For the 
sake of illustration, these lines originate at a pro- 
cessing entity 9 which is called a support proces- 
sor (SP). The support processor may be an entity 
which is external to the processing machine 8, yet 
which has access to the machine through the soft- 
ware which controls it Software access can be 
provided, for example, by a generic LOAD IMME- 
DIATE REGISTER command which manipulates 
the mask registers. 

In detail, the chip-level, event masking portion 
of the invention includes the event mask register 
25. comprised of latches 26 and 27. The positive 
outputs of the latches 26 and 27 are fed, respec- 



tively, to AND gates 32 and 35. while the com- 
plementary latch outputs are fed to OR gates 33 
and 36. The AND gates 32 and 35 are connected 
to the event signal lines 21 and 24, which are 
5 driven, respectively, by the state machine 20 and 
the circuit 23. When an event occurs, indicated by 
the conditioning of a signal to its positive digital 
state, the AND gate receiving the event signal will 
provide a positive output only if the corresponding 
w mask latch has been set. The positive outputs of 
the AND gates 32 and 35 are fed forward, respec- 
tively, by the OR gates 33 and 36. If the cor- 
responding mask latch is not set, the latch's com- 
plementary output will be ted forward by its re- 
75 spective OR gate to the AND gate 37. Thus, the 
AND gate collects and senses conditions of all 
mask events. 

Positive output of the AND 37 gate signifies the 
simultaneous occurrence of all masked events on 
20 the chip 18. The positive output is captured by the 
latch 38 at the transition of the CLK signal and 
forwarded thereby to the card-level AND gate 42. 

The card-level AND gate 42 collects ail of the 
chip event signals produced on the card 10. When 
ss those signals are logically positive in the same 
clock period, the output of the AND gate 42 transi- 
tions positively to produce the CARD EVENT signal 
on signal fine 44. CARD EVENT signals are col- 
lected in latches 46 and 47 of maintenance card 14 
30 where they are masked in the manner heretofore 
described for the chip level circuitry by the com- 
bination of the state mask register 45, and AND/OR 
gate combinations 48 and 49. K will be evident that 
the mask in the register 45 represents a state of 
35 the machine 8. When the machine state occurs, the 
output of the AND gate 51 rises. 

As thus far described, the output of the AND 
gate 51 is pulsed each time the machine state 
represented by the state mask in register 45 oc- 
40 curs. Since the state depends upon the occurrence 
of masked chip-level events, the output of the AND 
gate 51 is said to be "event-driven". 

The sequence of state occurrence signals out- 
put by the gate 51 is fed to the multimode error 
45 injection circuit 53. The circuit 53 acts to count the 
number of state signal occurrences in order to 
react in a predetermined manner by producing the 
INJECT signal. The predetermined manner of cir- 
cuit action results in the production of an INJECT 
so signal having particular temporal and duration char- 
acteristics. In this regard, the circuit 53 counts 
occurrences of the state signal and, upon reaching 
a particular count produces an INJECT signal 
which can have the shape of a pulse or which can 
55 be conditioned to a predetermined level until reset 
Further, the circuit can provide the desired INJECT 
signal delayed by an amount of time which is 
variable with respect to the occurrence of the de- 
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sired count Thus, the INJECT signal can mimic a 
fleeting transient error or one which, once occur- 
ring, is unvarying. 

As with reference to Figure 1 will verify, the 
INJECT signal results in production of simulated 
error in a particular chip only if masked at that 
chip. For example, conditioning of the INJECT sig- 
nal with a logically positive transition will raise a 
CHECK signal at the output of the AND gate 56 
only if a chip-level error is masked by setting the 
latch 58. 

Refer now to Figure 2 for an understanding of 
the multimode error injection circuit 53. The circuit 
53 conventionally conditions a state signal output 
by the AND gate 51 through a wave-shaping circuit 
consisting of a latch 70 and an AND gate 71. Each 
time the output of the AND gate transitions posi- 
tively, the AND gate 71 emits a pulse termed the 
"STATE" signal. The pulse is provided to an occur- 
rence counting circuit fed through the AND gate 
73. The purpose of the occurrence counting circuit 
is to count the number of STATE signal pulses and 
to provide an indication when a predetermined 
count has been reached. This is done by loading 
an initial count into a counting register 75 and 
decrementing the contents of the register each 
time a START signal occurs. Assuming a positive 
output from the inverter 76 and a positive state of 
the ENABLE signal, each pulse output by the AND 
gate 71 causes the output of the AND gate 73 to 
puise. This causes the AND gate 77 to pass the 
decremented count of the contents of register 75 
back into the register 75 through the path OR 78 • 
AND 79 - OR 80. The decremented count is placed 
into the register 75, where it is presented to a 
conventional decrement and compare circuit 82. 
When received by the decrement and compare 
circuit the count is decremented, and held in the 
circuit In the circuit 82, the count is compared 
against a binary magnitude of zero and a binary 
magnitude of 1. If the count equals the value of 
zero, a signal is provided on the signal line 83. If 
the count has a magnitude of 1, a signal is pro- 
vided on signal line 84. For each clock period in 
which the output of the AND gate 73 does not 
pulse, the output of the register 75 is wrapped back 
to its input without decrementing the count, through 
the path AND 86 - OR 78 - AND 79 - OR 80. On 
this path, the contents of the register 75 are not 
decremented. In this case, the output of the de- 
crement and compare circuit 82 does not change. 

The register 75 is initialized to a count through 
the signal path AND gate 87-OR gate 80, when the 
signal RMAP MODE (E3) is logically positive. The 
count entered into the register 75 is equal to the 
numerical digital value of the signal RMAP DATA. 

When the contents of the register 75 have 
been decremented zero, the signal on line 83 dis- 



ables the AND gate 73 through the inverter 76, but 
enables the AND gate 74, assuming that the output 
of the inverter 90 and the ENABLE and 
EVENT/CLOCK signals are logically positive. Now, 

5 the CLK signal provided to the register 91 causes 
the register's contents to decrement by one within 
each CLK signal cycle through a circuit identical to 
that just described for counting signal occurrence. 
This circuit decrements the register through the 

10 path 92 (decrement and compare) - 93a (AND) - 
93b (OR) - 94a (AND) • 94b (OR). It is observed 
that the register 91 is programmable by way of the 
AND gate 95. assuming a positive state for the 
RMAP MODE (E4) signal, in which case the regis- 

75 ter 91 wiil be initialized to the digital value of 
RMAP Data. It should be evident that the clock 
count decrement circuit counts CLK occurrences 
only after a succession of state signal occurrences 
have decremented the count in register 75 to zero. 

20 It is asserted that if the INJECT signal is to be 
raised, it will be raised either when the count of the 
register 75 goes to zero, or delayed from that event 
by a number of CLK pulses equal to the number in 
the register 92. It will be appreciated, therefore, 

25 that the INJECT signal can be delayed by a vari- 
able amount of time from a predetermined com- 
pound occurrence corresponding to the count in 
the register 75. 

When the INJECT signal is to be generated 

30 only at the occurrence of compound event signified 
by reducing the count in the register 75 to zero, 
the latch 96 is set, its positive output (EVENT 
ONLY) being provided to AND gate 97 and its 
complementary output being provided as the 

35 EVENT/CLOCK signal to AND gate 74. Assuming 
positive values for the RESET and ENABLE sig- 
nals, the output of the AND gate 97 will rise with 
the first positive transition of the AND gate 71 
following a decrement of the count in the register 

40 75 to one. Positive transition of the output of the 
AND gate 97 is provided to the error injector latch 
98 by way of the signal path through the OR gate 
100, one of the two AND gates 101 or 102, and the 
OR gate 104. It will be evident that the output of 

45 the error inject latch 98 will rise in response to that 
transition of the AND gate 71 which finally de- 
crements the count of the register 75 to zero 
because that transition is also combined by the 
AND gate 97 with the signal on signal line 84 
• so indicating that the count has been decremented to 
a magnitude of one. Therefore, the INJECT signal 
is provided at the output of the latch 98 concur- 
rently with the zero count being indicated on the 
signal fine 83. 

55 Alternatively, if the latch 96 is reset, the 
EVENT/CLOCK signal will be positive, enabling the 
AND gate 105. Again, assuming positive levels for 
the RESET and ENABLE signals, the output of 
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AND gate 105 will transition positively following 
decrementation of the count in the register 92 to a 
value of 1. Since the signal path through the output 
of the AND gate 105 will be the same as that 
described for the gate 97, the INJECT signal will 
rise concurrently with decrementation of the count 
in the register 92 to zero. 

The shape of the INJECT signal is determined 
by the state of the latch 110. When the latch 1 10 is 
set, and AND gate 102 is enabled, the output of the 
gate will rise in response to a positive transition at 
the output of the OR gate 111. The OR gate 111 
receives the output of the OR gate 100 and the 
positive output of the edge-triggered SET-RESET 
latch 113. When a positive transition is taken by 
the output of the AND gate 97 or the AND gate 
105, the output of the OR gate 100 rises, setting 
the latch 113. The OR gate 111 merges the pulse 
output of the gate 100 and the positive output of 
the latch 113 to provide a signal which transitions 
to a positive level in synchronism with the transition 
of the output of the OR gate 100 and then stays at 
the positive level until the latch 113 is reset. 

Alternatively, assume that the latches 110 and 
113 are reset Now, the AND gate 101 receives the 
positive levels of the complementary outputs of the 
latches 110 and 113 and pulses positively in re- 
sponse to the pulse output by the OR gate 100. 
TTie pulse is fed through the OR gate 104 to the 
latch 98. 

Obviously, the output of the latch 98 (the IN- 
JECT signal) will follow the input provided through 
the OR gate 104, with the INJECT signal pulsing 
with a signal provided through AND gate 101 or 
transitioning to a positive level with the output of 
the AND gate 102. 

An example of a simulated error which occurs 
in response to an INJECT signal is illustrated in 
Figure 3. Assume in Figure 3 that the INJECT 
signal is staged onto a card through an error stag- 
ing latch 120. The output of latch 120 is fed to 
chips i, j and k on the card for generation of 
injected errors. On chip k, the error condition, if 
enabled by the masked bit and the latch 122 will 
configure parity generation circuitry consisting of 
gates 130-133 to incorrectly generate parity. If the 
error mask bit is reset, the generated parity signal 
is given correctly by the output of the AND gate 
130 to the OR gate 133 and the latch 135. If the 
error mask bit is enabled, the parity bit generated 
will, be inverted from its correct sense by one of 
the gates 131 or 132. 

One will appreciate that provision of a pulsed 
INJECT signal upon the counting of STATE signal 
occurrences will result in intermittent Injection of 
errors, with each injection occurring each time the 
STATE signal pulses for a number of times equal 
to the preset value in the register 75. This can be 



termed an "intermittent" mode of error injection. 
On the other hand, if the latch 110 is set, the 
INJECT signal will be continuously asserted, result- 
ing in the injection of masked errors with each CLK 
5 transition following the getting of the latch 113. This 
is referred to as a "continuous" mode of error 
injection. 

Programmability of the event-controlled error 
injection system of this invention is provided 

10 through access of the SP 9 to the mask registers 
28, 45, and 58 of Figure 1. The SP 9 is also* 
connected by means not illustrated to the count 
registers 75 and 91 by way of the RMAP mode 
(E3), RMAP mode (E4) and RMAP DATA signals. 

75 Last, the SP 9 also provides the error mode selec- 
tion by conventional programming interfaces to the 
latches 96, 110, and through circuitry not illustrated 
to the signals ENABLE and RESET. 

Obviously, many applications and variations of 

20 this invention will occur to those skilled in the art, 
and may be practiced without departing from the 
spirit of this invention and without avoiding the 
scope of the following claims. 

25 

Claims 

1. In a processing machine in which a plurality 
of machine events occur over time, a system for 

30 injecting simulated errors into said machine for test 
and evaluation of machine processes, comprising: 
a machine event detector in said machine; 
a programmable mask means in said machine and 
connected to said machine event detector for 

35 masking machine events detected by said detector 
to detect the occurrence of a mask-defined ma- 
chine state; 

count means connected to said mask means for 
counting occurrences of said state; and 

40 programmable error injection means, connected to 
said count means, for injecting an error into said 
machine upon said count means reaching a certain 
count, said error being injected according to an 
intermittent mode, or according to a continuous 

45 mode. v 

2. The system of Claim 1, further including: 
delay means in said programmable error injection 
means for selectably delaying the injection of said 
error from said certain count 

50 
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